Syntactic annotation of spoken utterances: A case study on the Czech Academic Corpus
نویسندگان
چکیده
Corpus annotation plays an important role in linguistic analysis and computational processing of both written and spoken language. Syntactic annotation of spoken texts becomes clearly a topic of considerable interest nowadays, driven by the desire to improve automatic speech recognition systems by incorporating syntax in the language models, or to build language understanding applications. Syntactic annotation of both written and spoken texts in the Czech Academic Corpus was created thirty years ago when no other (even annotated) corpus of spoken texts has existed. We will discuss how much relevant and inspiring this annotation is to the current frameworks of spoken text annotation.
منابع مشابه
Syntactic Feature of EFL Speakers’ Conference Presentations: The Case of Passive Voice and Pseudo-Cleft
Acquiring proficiency in academic genres is a key factor in research community. Among various genres in academic discourse communities, spoken genre, especially Conference Presentations (CPs), play a crucial role in research communities, though investigation on this important genre is in its infancy or is relatively under-researched. Therefore, the present study aims to shed light on the import...
متن کاملPrague Dependency Treebank Annotation Errors: A Preliminary Analysis
This paper presents a basic analysis of syntactic annotation errors and inconsistencies in the Prague Dependency Treebank, the biggest corpus of Czech with manual syntactic annotation. The corpus is used for developing and testing of many syntactic analysers of Czech and the problems in the annotation have an essential impact on the evaluation of the quality of these parsers and the results of ...
متن کاملOral2008: New Balanced Corpus of Spoken Czech 1
Attention paid to spoken language has increased in the last decades, as well as its importance for linguistic research and natural language processing in general. However, compilation of spoken corpora as an indispensable source of data is very laborious and thus expensive. Nevertheless, more and more spoken corpora are being created currently. There are various approaches to their design, dept...
متن کاملSpoken Requests for Tourist Information: a Speech Acts Annotation
This paper presents an ongoing corpus annotation of speech acts in the domain of tourism, which falls within a wider project on multimodal question answering. An annotation scheme and set of guidelines are developed to mark information about parts of spoken utterances which require a response, distinguishing them from parts of utterances which do not. The corpus used for annotation consists of ...
متن کاملSpanish Phoneme Classification by Means of a Hierarchy of Kohonen Self-Organizing Maps
Research Issues for the Next Generation Spoken Dialogue Systems p. 1 Data-Driven Analysis of Speech p. 10 Towards a Road Map for Machine Translation Research p. 19 The Prague Dependency Treebank: Crossing the Sentence Boundary p. 20 Text Tiered Tagging and Combined Language Models Classifiers p. 28 Syntactic Tagging p. 34 Information, Language, Corpus and Linguistics p. 39 Prague Dependency Tre...
متن کامل